S-C DisAssembler

The S-C DisAssembler (SCDA) is a ProDOS-based tool for generating
assembly language source code from a binary or system file.  SCDA
operates in conjunction with the S-C Macro Assembler, to convert 6502
and 65C02 machine code into source code files in the "S-C" format.  Here
are some of the features:

*  Input is from one or more binary object files, including file types
BIN and SYS.

*  Output is to one or more "S-C" type (compressed source code) files.

*  Generates comment lines before each label, listing all references to
that label.

*  Disassembly is "script" driven, allowing incremental enhancement as
knowledge is gained about the program being disassembled.

*  Input files may be positioned to specific starting addresses.

*  Decodes ProDOS "MLI" calls as such.

*  Allows pre-named symbols up to 32-characters long.

*  Comes with complete commented source code, allowing you to understand
how it works and make your own personal extensions.

Of all the features, the most important may be the "script".  This is
essentially a "program", written in "disassembly language".  The script
allows you to define which input files to include and which output files
to generate, to name symbols such as monitor entry points and major
subroutines in your program being disassembled, to define table areas,
and even to insert comments.

The script itself is written using the standard S-C Macro Assembler, and
may be saved on a source file just as an assembly language program
would.  As you gain knowledge about the program you are disassembling,
you can add lines to the script.


Disk Contents:

The disk is in ProDOS format, with a volume name of /S.C.DISASM/.  The
disk is not protected in any manner, and is fully copyable.  It is a
good idea to make a backup copy right away, and put the original in a
safe place.

The main file of interest on the /S.C.DISASM/ disk is S.C.DISASM.  This
file is the S-C DisAssembler, ready to run.  I suggest you use FILER,
System Utilities, or some other file copying program to make a copy of
this file on your working disks.  When I am working on a disassembly
project, I usually make a special diskette which contains the object
code I am tearing apart, the tools I am using to do the tearing, and the
resulting source code files.

The SCDA disk also contains all of the source code for the S-C
DisAssembler in the format of the S-C Macro Assembler.  The text file
ASM gives you a short way to run an assembly of all of these source
files; it contains two simple lines:  "LOAD D.ACF" and "ASM".  The file
named D.ACF is a control file, containing mostly ".INB filename" lines
which pull in each of the other source code files during assembly.

I have included several sample scripts with associated binary files,
which you can look at, study, play with, and so on.  SCRIPT.F800 is a
fairly complete script for the Apple monitor from $F800 through $F881. 
It reads BIN.F800 and produces SRC.F800.

SCRIPT.FILER disassembles a particular section of the Apple FILER
program.  The portion is located at the end of the FILER file, starting
at position $5E00 in that file.  This section of code is relocated to
$0800 when FILER is executed, so the disassembly uses an origin of
$0800.  For licensing reasons, I did not include the Apple FILER program
on this disk; however, it is included on the S-C Macro Assembler disk.

SCRIPT.SIDER disassembles the file named B.SIDER.  This is a copy of the
firmware used with the SIDER hard disk (version C).  The script produces
four output files, which can be assembled by using an assembly control
file which names all of those files on ".INB" lines.  The file named
SIDER.ACF is just such a control file.  (Before trying to assemble, you
will have to delete the ".OR $C800" line from the generated file
SIDER.MAIN, because the correct .OR and .TA lines are part of
SIDER.ACF.)

I have also included a script file which includes the most commonly used
equates from the Apple monitor, SCRIPT.MONDEFS.  You can use it as the
seed for working with large programs which make heavy use of the monitor
entry points.


Operation:

Once you have developed a "script" for a program you wish to
disassemble, a process which is explained below, the rest is easy.  With
the S-C Macro Assembler in operation, the script in memory, and the
files to be disassembled online, you simply BRUN the S-C DisAssembler. 
If the disk containing SCDA is online, you simply type "BRUN S.C.DISASM"
or "-S.C.DISASM".  If you do not have enough drives for the SCDA file
and your own source and object files to be online at the same time, then
first mount the SCDA disk and type "BLOAD S.C.DISASM"; then mount your
own disk(s) and type "$800G".

When I am working on a disassembly project, I find it useful to set up
the S-C Macro Assembler "." command to start SCDA.  The "." command is
vectored through a JMP instruction at $800F, so you can patch it to
start up the DisAssembler at $800 by patching the address $0800 at
$8010:

     $8010:00 08

Then any time you wish to start executing a disassembly script, simply
type a "." at the colon prompt, and hit RETURN.  Just be careful not to
use the command if you have not got the DisAssembler in memory!

The S-C DisAssembler operates in two passes:  your script is executed
twice.  During the first pass a symbol table is built in RAM, together
with a cross reference list.  During the second pass the generated
source code is written onto one or more output files.  The source code
is written in S-C Macro Assembler format:  each line has a line number,
followed by label, opcode, and operand fields.  SCDA starts with line
number 0001, and continues with an interval of 1.  The output file will
be type "S-C" in the catalog when you are in the S-C Macro Assembler. 
[When you are in BASIC.SYSTEM, the file type will display as "INT".  In
the Beagle Brothers Applesoft Compiler, the file type will display as
"COM".]

During pass one SCDA displays the message "PASS 1" and three running
addresses.  The first address is the current location counter; the
second, the current address of the top of the cross reference symbol
table; and the third, the current address of the bottom of the
pre-defined symbol table.  These are displayed for your curiosity, and
to give you an idea of how much memory is still available for expanding
your script.  The memory between the two symbol tables is the free
memory.

During pass two SCDA displays the message "PASS 2" and one running
address, the current location counter.  You will notice slight pauses
while SCDA is running when it reads from or writes to the disk.


Scripts:

A disassembly script is made up of command lines, with one disassembly
command or comment on each line.  You create and modify a script using
the S-C Macro Assembler, with either the standard line editor or the
Laumer Research Full Screen Editor, in the same way you create and
modify assembly language programs.  Scripts can be saved on files using
the SAVE command, and loaded for use with the LOAD command.  The special
"auto-SAVE" comment line which we recommend at the beginning of assembly
language source files will also work in disassembly scripts.

There are currently eleven different commands which can be included in
scripts.  Future versions will include more commands, in order to allow
more precise control over the disassembly of data regions, as well as
other desirable features.  Most of the commands consist of a single
letter, followed by a colon, followed by whatever parameters the
commands require.  Other characters may be include between the command
character and the colon; for example, you may spell out the command or
even make a comment.  Here are the commands:

I:pathname    The I-command tells the DisAssembler to begin reading
object code from the specified pathname.  If you do not specify a
complete pathname, the current ProDOS pathname-prefix will be used.  If
there is no current pathname-prefix, the last-accessed slot and drive
will be used.  You may use more than one object-code input file in a
script.  Each I-command causes the closing of any previous input file,
and the opening of the new one.  It will open positioned at the first
byte:  if you wish a different starting position, use the P-command
after the I-command.

              Examples:  1000 I:/HARD1/FILER
                         1000 INPUT:SCASM.SYSTEM

O:pathname    The O-command tells the DisAssembler to close any previous
output source-code file, and open a new one.  By using more than one
output file, you can partition the source code into a series of files
which you will assemble using the ".IN" or ".INB" directives.

              Example:  1010 O:SOURCE.FILER

P:hex         The P-command positions the input object-code file to the
specified byte in the file.  The hexadecimal value you specify actually
is used in an MLI Set Mark call.

              Examples:  1020 P:5E00
                         1020 POSITION to $800 part:5E00

L:ZIX hex-hex
The L-command defines which prefix letters to use for zero-page symbols
(Z), within the hex-hex range (I), and non-zeropage symbols outside the
hex-hex range (X).  I generally use Z, I, and X; however, you may use
any letters you wish. For example, assume the code I am disassembling is
located during execution between $800 and $3FFF, and I want those
symbols to start with P; further assume I want zeropage labels to begin
with Z, and external references to begin with R; then the command would
be:

              1030 L:ZPR 800-3FFF

The L-command is required if you wish to have any internal labels of the
form "I.xxxx".  Without the L-command to specify the range of internal
labels, they will all be classified as external labels.

XREF:OFF
XREF:ON       The XREF-command lets you turn the cross reference line
generation off, or back on.  If you do not include an XREF command in
your script, cross reference lines will be generated.  If the XREF:OFF
command is placed before any C- or H-commands, no cross reference lines
will be generated.  This saves a lot of memory, and will allow you to
disassemble larger programs with a single script.  Extremely large
programs can be disassembled without the cross reference lines, and then
you can use the S-C XREF program (a separate product available from S-C
Software) to generate a complete cross reference listing.  If you
include several XREF commands, switching the option off and on and off
and on, the cross reference lines will not be complete; they will be
omitted when the option is off, and may only list part of the references
when the option is on.

W:hex         The W-command lets you specify the width of the label
field.  The default width is six, which allows labels up to six
characters long to be written on the same line with the corresponding
opcode or data.  Longer labels will be written on a line by themselves,
with the corresponding opcode or data following on the next line.  You
may specify a value (in hex) from 0 to 3E.  A value of 0 will force all
labels to be written on separate lines, while any value over 1F will
allow all labels to written on the same line with their opcode or data.

C:hex1-hex2
C:-hex        The C-command tells the S-C DisAssembler to create source
lines for 65C02 or 6502 code from current location to -hex, or from hex1
to hex2.  A future enhancement to the DisAssembler will be the ability
to disassemble 65816 opcodes.  The bytes to be disassembled come from
the input object-code file.  If the disassembly address (hex1) changes
from the current address, an origin directive (.OR $hex1) will be
generated.

              Examples:  1050 C:800-843
                         1070 C:-89A

H:hex1-hex2
H:-hex        The H-command tells the S-C DisAssembler to create source
lines for a data region from current location  to -hex, or from hex1 to
hex2.  Generates up to eight bytes per ".HS" line.  Future versions of
the DisAssembler will provide more extensive data-disassembly
capability.  The bytes to be disassembled come from the input
object-code file.  If the disassembly address (hex1) changes from the
current address, an origin directive (.OR $hex1) will be generated.

              Examples:  1040 H:844-84F
                         1060 Hex:-9FF

Normally C- and H-commands will alternate back and forth in a
disassembly script.

"comment      Lines beginning with a quotation mark generate a comment
line in the output source-code file.  The generated line will use an
asterisk in place of the quotation mark, and the comment will copied. 
For example, the command:

                  "This is a comment.

              would generate, with the appropriate line number:

                  0347 *This is a comment.

=hex,name     Lines beginning with an equal sign generate symbolic
labels to be used for particular hexadecimal values.  All of the
=-commands should come before any C- or H-commands in the script for
best results.  Names for values which are outside the range specified on
the L-command line will cause zeropage and external label equates to be
generated, of the form "symbolname .EQ $value".  The definition lines
will be generated in the appropriate position in the output source-code
file.

Zeropage labels will be generated first, in numeric order.  Next come
any external labels which precede the program being disassembled, in
numeric order.  External labels for values higher than the L-command
range will cause ".EQ" lines to be generated after the C- and H-commands
have all been processed.

Names for values which are inside the L-command range will cause
internal labels to be generated.  Each internal label will be generated
when the location counter reaches the value of that label, during the
processing of the C- and H-commands.  If a label defines the beginning
of an opcode, as is usually the case, that label will be generated in
the label field (if it will fit), or simply be generated on a line by
itself without any ".EQ value" following.  If the label defines a value
inside a multi-byte instruction line, it will be defined after the
instruction line with a ".EQ *-1" or ".EQ *-2" definition.  Values which
are referenced but not given special names by the =-command will receive
labels using your selected prefix letters with the hexadecimal value.

Only one name can be given to a particular value.  If you try to define
more than one name to the same value, SCDA will quit with the "EXTRA
DEFINITION FOR SAME VALUE" error message.  On the other hand, it is all
right to use the same label over again.  When you re-assemble you will
get multiple-definition errors, unless the labels you re-used were
legitimate local labels (a period followed by one or two digits).

              Examples:  1050 =36,CSWL
                         1060 =37,CSWH
                         1070 =FDED,MON.COUT
                         1080 =28,BASE.ADDRESS
                         1090 =29,BASE.ADDRESS+1

*comment      Lines beginning with an asterisk are comments within the
script itself, which help to document the script.  These do not affect
the disassembly process in any way.

One special kind of comment line allows the use of the S-C Macro
Auto-SAVE feature.  Type the "*", six pairs of ctrl-O,backspace
characters, and then "SAVE" and your filename.  If this comment line is
the first line of your script, typing Escape-S will cause the SAVE
command to be displayed on the screen; then typing a RETURN will cause
the SAVE command to be executed.  This technique is very useful in being
sure you always SAVE a script where it belongs.



The simplest possible script would include only four lines:  one each to
specify an input object-code file and an output source-code file, one to
specify the prefix characters and the range for internal labels, and one
to specify the address range and type of disassembly.  For example,

    1000 I:OBJECT
    1010 O:SOURCE
    1020 L:ZIX 800-1FFF
    1030 C:800-1FFF

Examples of more complicated scripts are included on the SCDA disk.


Automatic *---- lines

The S-C DisAssembler automatically inserts comment lines to separate
subroutines.  The lines are inserted after an RTS opcode, and after JMP
opcodes.  The separation lines are the same as generated within the S-C
Macro Assembler by typing Escape-L after the line number:  an asterisk
followed by a line of dashes.


Cross Reference Lines

The S-C DisAssembler is somewhat unique in its ability to generate
imbedded cross reference information as comment lines in the output
source-code file.  These lines consist of an asterisk (to signify a
comment line), the address of the label which follows in parentheses,
and a list of addresses from which this label is referenced.  If the
address list is too long for an 80-column source line, additional lines
will be generated.  The cross reference lines prove to be extremely
helpful in analyzing disassembled programs.

Nevertheless, you may not wish to see the cross reference lines.  You
can turn off this feature using the "XREF:OFF" command in a script.  One
reason for turning it off is to save memory during disassembly for an
extremely large binary file.


Delta lines

The S-C DisAssembler makes every effort to generate every label
necessary to assure error-free re-assembly.  Sometimes this includes
generating lines of the form:

       1234 I.087B .EQ *-1

These are called "delta equates", and you will normally only see these
when you are disassembling self-modifying code, or code that uses values
within operand fields of instructions as data.  Sometimes it occurs
because of offset references to data tables, such as the following:

       LDX POINTER     TABLE INDEX + $80
       LDA TABLE-$80,X

When disassembling the second line, SCDA would generate a label
reference to TABLE-$80, which might be inside a code area, and might
cause a delta reference.  Whenever you find a delta reference, or any
label that is inside a code area but referred to as data by the program,
you might suspect offset table references.


Memory Usage:

The S-C DisAssembler loads at $800, and uses all the memory between
there and the bottom of your script.  Your script is maintained by the
S-C Macro Assembler starting at HIMEM-1, which is usually $73FF, and
going down toward $800.  (If you are using the Laumer Research Full
Screen Editor, it eats up more memory and lowers HIMEM.)

   800-17xx           S-C DisAssembler
   from there up...   the cross reference symbol table
     between          free memory
   from xxxx down     the pre-defined symbol table
   xxxx to 73FF       your script
   7400-BFFF          SCASM.SYSTEM and file I/O buffers

   00-1D              various pointers
   73,74              point to end of script+1  (HIMEM)
   CA,CB              point to beginning of script

Note that running SCDA will overwrite any symbol table left over from an
assembly, and assembling any program will overwrite SCDA.


Error messages:

The S-C DisAssembler aborts disassembly when an error is detected.  You
may see any of the following error messages.  SCDA also displays the
line in the script which caused the error.

NOT A VALID COMMAND...............The first character of a script line
is not one of the valid command or comment characters.

SCRIPT LINE TOO LONG..............The maximum length of a script line is
80 characters, not including the line number.

MISSING COLON.....................The colon oafter the command character
is missing.  You are allowed to include any other characters you wish
between the commmand character and the colon; for example, you might
wish to spell out the command names.

MISSING COMMA.....................The comma is missing in the "="
command.

MISSING HEX VALUE.................A required hexadecimal value is
missing.

HEX RANGE BACKWARDS...............The address range in a C- or H-command
is backwards.  The lower address must go first.  If you have specified a
continuation, as in "C:-1234", and the hex value is behind the current
location, you will get this error message.

EXTRA DEFINITION FOR SAME VALUE...You have tried to give more than one
name to the same value with an "=" command.

MEMORY FULL.......................The predefined symbol table and the
cross reference symbol table have met in memory.  This means you are
going to have to do something to reduce the memory requirements.  On
option is to break the disassembly into separate parts, so that the
symbols will all fit in memory.  Another option is to eliminate the
cross reference lines by using the XREF:OFF command.
!np
BAD PATHNAME......................The pathname on an I- or O-command is
either missing, or does not specify a complete path.  If the volume name
is not specified and there is no prefix, SCDA attempts to complete the
pathname by using the volume name it finds in the most recently accessed
drive.  If there is no ProDOS volume in that drive, you will get this
error message.

OUTPUT FILE WRONG FILE TYPE.......The output file must be of type "S-C"
($FA).  If there is an old file by the same pathname but a different
type, you will get this error.

POSITION BEYOND END OF FILE.......You are attempting to position
(P-command) or read (C- or H-commands) past the end of the input
object-code file.

READ PROBLEM ($XX)................ProDOS MLI error $XX when READing the
input object-code file.

WRITE PROBLEM ($XX)...............ProDOS MLI error $XX when WRITing the
output source-code file.

CREATE PROBLEM ($XX)..............ProDOS MLI error $XX when CREATing the
output source-code file.

OPEN PROBLEM ($XX)................ProDOS MLI error $XX when attempting
to OPEN either the input or output file.

-------------------------
THE END
